-
Notifications
You must be signed in to change notification settings - Fork 3k
Gemini 3 Pro support and cross-model conversation compatibility #2158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Bump litellm dependency to >= 1.80.7 for Gemini thought signatures support - Add Gemini 3 Pro thought_signature support for function calling - Handle both LiteLLM provider_specific_fields and Gemini extra_content formats - Clean up __thought__ suffix on tool call ids for Gemini models - Attach provider_data to all non-Responses output items - Store model, response_id and provider specific metadata - Store Gemini thought_signature on function call items - Use provider_data.model to decide what data is safe to send per provider - Keep handoff transcripts stable by hiding provider_data in history output
|
Thanks for sending this PR! Overall, the design is clean and the code looks good to go. If anyone could try this branch out and share early feedback before releasing it, it would be greatly appreciated. |
|
I am currently working on 0.6.4 release. This one can be included in 0.7.0 or later, so please wait a moment! |
markmcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi 👋 - I'm from the Gemini team, just took a quick pass over the code to see how it works with our implementation of thought re-circulation and everything here LGTM. One FYI comment on parallel tool calls but no action is required.
| continue | ||
|
|
||
| # Default to skip validator, overridden if valid thought signature exists | ||
| tool_call["provider_specific_fields"] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI - in the context of parallel tool-calls, this adds the dummy signature to every tool call returned. In the docs, we specify that a dummy signature is to be provided on the first tool call, however it is safe to apply on all of them so no need to change anything.
Resolves:
Summary
This PR does two main things:
thought_signaturesin function calling.The goal is to make different providers interoperable: allowing them to safely share the same
to_input_list()items, while each provider only receives the metadata it understands.Examples
Besides unit tests, I performed live tests for all the following scenarios:
LiteLLM + Gemini
Gemini ChatCompletions (OpenAI-compatible endpoint)
Cross-model conversations (same raw items handled by different models)
Handoffs (disabled
nest_handoff_history)1. Gemini 3 Pro function calling (
thought_signatures)Gemini 3 Pro now requires a
thought_signatureattached to function call in the same turn.Docs: https://ai.google.dev/gemini-api/docs/thought-signatures
This PR supports both integration paths with non-streaming and streaming modes:
The conversation flow is: LiteLLM ↔ ChatCompletions ↔ our raw items.
LiteLLM layer
LiteLLM places Gemini’s
thought_signatureinsideprovider_specific_fields.This PR handles the conversion between:
LiteLLM’s
provider_specific_fields["thought_signature"]↔
Google ChatCompletions format
extra_content={"google": {"thought_signature": ...}}ChatCompletions layer
This PR handles the conversion between:
Google ChatCompletions format
extra_content={"google": {"thought_signature": ...}}↔
our raw item’s internal new field
provider_data["thought_signature"]Cleaning up LiteLLM’s
__thought__suffixLiteLLM adds a
__thought__suffix to Gemini tool call ids (see:BerriAI/litellm#16895). This suffix is not needed since we
have
thought_signatureand it causes call_id validation problems when the items are passed to other models.Therefore, this PR removes it.
2. Enables cross-model conversations
To support cross-model conversations, this PR introduces a new
provider_datafieldon raw response items. This field holds metadata not compatible with the OpenAI Responses API, allowing us to:
For non–OpenAI Responses API models, we now store this into raw item:
This design is like PydanticAI, which uses a similar structure. The difference: PydanticAI stores metadata for all models,
whereas this PR stores
provider_dataonly for non-OpenAI providers.With
provider_dataand the model name passed into the converters, agents can now safely switch models while reusing the same raw items fromto_input_list(). This includes:It also works with handoffs when
nest_handoff_history=False.Implementation Details
Because items in a conversation can come from different providers, and each provider has different requirements, this PR passes the target model name into several conversion helpers:
Converter.items_to_messages(..., model=...)LitellmConverter.convert_message_to_openai(..., model=...)ChatCmplStreamHandler.handle_stream(..., model=...)Converter.message_to_output_items(..., provider_data=...)This lets us branch on behavior for different providers in a controlled way and avoid regressions by handling provider-specific cases. This is especially important for reasoning models, where each provider handles encrypted tokens differently.
There are libraries like PydanticAI and LangChain define their own internal standard formats to enable cross-model conversations:
By contrast, LiteLLM has not fully abstracted away these differences. It focuses on making each model call work with provider-specific workarounds, without defining a normalized history format for cross-model conversations. Therefore, we need explicit model-aware at this layer to make cross-model possible.
For example, when we store Claude's
thinking_blockssignature inside our reasoning item'sencrypted_contentfield, we also need to know that it came from a Claude model. Otherwise, we will send this Claude-only encrypted content to another provider, which cannot safely interpret it.The guiding principle in this PR is to treat OpenAI Responses API items as the baseline format, and use
provider_datato extend them with provider-specific metadata when needed.For OpenAI Responses API:
When sending items to the OpenAI Responses API, we must not send provider-specific metadata or fake ids.
This PR adds:
OpenAIResponsesModel._remove_openai_responses_api_incompatible_fields(...)provider_data.idwhen it equalsFAKE_RESPONSES_ID.provider_data(these are provider-specific).provider_datafield from all items.This keeps the payload clean and compatible with the Responses API, even if the items previously flowed through non-OpenAI providers.
Design notes: reasoning items vs provider_data
This PR does not introduce a separate reasoning item (e.g. Claude thinking_blocks does) for Gemini function call's
thought_signatures. Instead it stores the signatures inprovider_dataon the function call item.The main reasons:
This design is again similar to PydanticAI’s approach and also mirrors the underlying Gemini parts structure: signatures are attached to the parts they describe instead of creating an extra reasoning item with no text.
I also study at the Gemini API raw format, there are four raw part structure with thought_signature:
functionCall: {...}withthought_signature: "xxx"→ handled in this PR: keep the thought_signature with the function call.text: "...."withthought_signature: "xxx"→ could attach to the output item (no extra reasoning item needed).text: ""withthought_signature: "xxx"→ (empty text) this is the case where a standalone reasoning item makes sense.text: "summary..."withthought: true→ (this is thinking summary) this is another case where a standalone reasoning item make sense.This PR implements case (1), which is sufficient for Gemini’s current function calling requirement.
Other cases can be added later if needed.
This PR should have no side effects on projects that only use the OpenAI Responses API, and I believe it establishes a better groundwork for handling various provider-specific cases.